智能论文笔记

SearchMorph:Multi-scale Correlation Iterative Network for Deformable Registration

Xiao Fan , Shuxin Zhuang , Zhemin Zhuang , Shunmin Qiu , Alex Noel Joseph Raj , Yibiao Rong

分类：计算机视觉

2022-06-27

可变形的图像注册提供了有关图像的动态信息，并且在医学图像分析中至关重要。但是，由于单个时期脑MR图像和多阶梯超声心动图的不同特征，因此很难使用相同的算法或模型准确地注册它们。我们提出了一个无监督的多尺度相关性迭代注册网络（SearchMorph），该模型具有三个亮点。（1）我们引入了成本量来加强特征相关性和构造的相关金字塔以补充多尺度相关信息。（2）我们设计了搜索模块来搜索多尺度金字塔中功能的注册。（3）我们使用GRU模块进行变形场的迭代细化。本文提出的网络显示了在常见的单个时间段登记任务中的领导，并解决了多时间运动估计任务。实验结果表明，我们提出的方法比最新方法获得了更高的注册精度和更低的折叠点比。

translated by 谷歌翻译

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan , Noah Brown , Justice Carbajal , Yevgen Chebotar , Joseph Dabis , Chelsea Finn , Keerthana Gopalakrishnan , Karol Hausman , Alex Herzog , Jasmine Hsu

分类：机器人 | 人工智能 | 自然语言处理 | 计算机视觉 | 机器学习

2022-12-13

By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io

translated by 谷歌翻译

Optimising Chest X-Rays for Image Analysis by Identifying and Removing Confounding Factors

Shahab Aslani , Watjana Lilaonitkul , Vaishnavi Gnanananthan , Divya Raj , Bojidar Rangelov , Alexandra L Young , Yipeng Hu , Paul Taylor , Daniel C Alexander , Joseph Jacob

分类：计算机视觉 | 机器学习

2022-08-22

在COVID-19大流行期间，在COVID-19诊断的紧急环境中进行的大量成像量导致临床CXR获取的差异很大。在所使用的CXR投影，添加图像注释以及临床图像的旋转程度和旋转程度中可以看到这种变化。图像分析社区试图通过开发自动化的CoVID-19诊断算法来减轻大流行期间过度拉伸放射学部门的负担，该诊断算法是CXR成像的输入。已利用大量公开的CXR数据集来改善CoVID-19诊断的深度学习算法。然而，公开可用数据集中临床可获得的CXR的可变质量可能会对算法性能产生深远的影响。 COVID-19可以通过图像标签等图像上的非动物特征的算法来推断诊断。这些成像快捷方式可能是数据集特定的，并限制了AI系统的概括性。因此，了解和纠正CXR图像中的关键潜在偏差是CXR图像分析之前的重要第一步。在这项研究中，我们提出了一种简单有效的逐步方法，以预处理Covid-19胸部X射线数据集以消除不希望的偏见。我们进行消融研究以显示每个单个步骤的影响。结果表明，使用我们提出的管道可以将基线共证检测算法的精度提高到13％。

translated by 谷歌翻译

GATE: Gated Additive Tree Ensemble for Tabular Classification and Regression

Manu Joseph , Harsh Raj

分类：机器学习

2022-07-18

我们为表格数据（门）（门）提出了一种新颖的高性能，参数和计算有效的深度学习体系结构。Gate使用GRU启发的门控机制作为具有内置特征选择机制的功能表示学习单元。我们将其与一组不同的非线性决策树结合在一起，并以简单的自我注意力重新加权，以预测我们所需的输出。我们证明，通过在几个公共数据集（分类和回归）上进行实验，GATE是SOTA方法的竞争替代方法。该纸张一旦审查，该代码将立即上传。

translated by 谷歌翻译

Improving Speech Enhancement through Fine-Grained Speech Characteristics

Muqiao Yang , Joseph Konan , David Bick , Anurag Kumar , Shinji Watanabe , Bhiksha Raj

分类：机器学习

2022-07-01

尽管基于深度学习的语音增强系统在提高语音信号的质量方面取得了迅速的进步，但它们仍然可以产生包含伪像且听起来不自然的输出。我们提出了一种新颖的语音增强方法，旨在通过优化言语的关键特征来提高增强信号的知觉质量和自然性。我们首先确定与语音质量良好相关的关键声学参数（例如抖动，微光和光谱通量），然后提出目标函数，旨在减少相对于这些功能的清洁语音和增强语音之间的差异。完整的声学特征是扩展的Geneva声学参数集（EGEMAPS），其中包括与语音感知相关的25种不同属性。考虑到这些功能计算的非差异性质，我们首先构建了EGEMAP的可区分估计器，然后使用它们来微调现有的语音增强系统。我们的方法是通用的，可以应用于任何现有的基于深度学习的增强系统，以进一步改善增强的语音信号。对深噪声抑制（DNS）挑战数据集进行的实验结果表明，我们的方法可以改善最新的基于深度学习的增强系统。

translated by 谷歌翻译

Demo: Untethered Haptic Teleoperation for Nuclear Decommissioning using a Low-Power Wireless Control Technology

Joseph Bolarinwa , Alex Smith , Adnan Aijaz , Aleksandar Stanoev , Manuel Giuliani

分类：机器人

2022-06-27

触觉远程操作通常是通过有线网络技术（例如以太网）实现的，这些技术可以保证控制循环的性能在通信媒体上封闭，尤其是在延迟，抖动和可靠性方面。该演示表明，在核退役用例中，在一种名为Gallop的新型低功率无线控制技术（称为Gallop）上进行触觉遥控的能力。它显示了疾驰的生存能力，可以满足触觉远程运行的延迟，及时性和安全要求。作为演示的一部分进行的评估表明，在现成的蓝牙5.0芯片组上实施的疾驰可以替代传统的有线TCP/IP连接，并且在同一用例中胜过基于WiFi的无线解决方案。

translated by 谷歌翻译

Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution

Chonghan Chen , Qi Jiang1 , Chih-Hao Wang , Noel Chen , Haohan Wang , Xiang Li , Bhiksha Raj

分类：计算机视觉 | 机器学习

2022-06-18

视觉接地是一项旨在根据自然语言表达方式定位目标对象的任务。作为一项多模式任务，文本和视觉输入之间的特征相互作用至关重要。但是，先前的解决方案主要在将它们融合在一起之前独立处理每种模式，在提取视觉功能时，这并不能充分利用相关的文本信息。为了更好地利用视觉接地中的文本视觉关系，我们提出了一个查询条件的卷积模块（QCM），该模块（QCM）通过将查询信息纳入卷积内核的产生中来提取查询感知的视觉特征。借助我们提出的QCM，下游融合模块接收到更具歧视性的视觉特征，并专注于表达式中描述的所需对象，从而导致更准确的预测。在三个流行的视觉接地数据集上进行的广泛实验表明，我们的方法可以达到最新的性能。此外，当直接用于预测而无需进一步的多模式融合时，查询感知的视觉特征足以实现与最新方法可比的性能。

translated by 谷歌翻译

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava , Abhinav Rastogi , Abhishek Rao , Abu Awal Md Shoeb , Abubakar Abid , Adam Fisch , Adam R. Brown , Adam Santoro , Aditya Gupta , Adrià Garriga-Alonso

分类：自然语言处理 | 人工智能 | 机器学习 | (统计)机器学习

2022-06-09

语言模型既展示了定量的改进，又展示了新的定性功能，随着规模的增加。尽管它们具有潜在的变革性影响，但这些新能力的特征却很差。为了为未来的研究提供信息，为破坏性的新模型能力做准备，并改善社会有害的效果，至关重要的是，我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战，我们介绍了超越模仿游戏基准（Big Bench）。 Big Bench目前由204个任务组成，由132家机构的442位作者贡献。任务主题是多样的，从语言学，儿童发展，数学，常识性推理，生物学，物理学，社会偏见，软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号，Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为，跨越了数百万到数十亿个参数。此外，一个人类专家评估者团队执行了所有任务，以提供强大的基准。研究结果包括：模型性能和校准都随规模改善，但绝对的术语（以及与评估者的性能相比）；在模型类中的性能非常相似，尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分，而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标；社交偏见通常会随着含糊不清的环境而随着规模而增加，但这可以通过提示来改善。

translated by 谷歌翻译

Conservation Tools: The Next Generation of Engineering--Biology Collaborations

Andrew Schulz , Cassie Shriver , Suzanne Stathatos , Benjamin Seleb , Emily Weigel , Young-Hui Chang , M. Saad Bhamla , David Hu , Joseph R. Mendelson III , .

分类：机器学习

2023-01-03

The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.

translated by 谷歌翻译

Understanding Political Polarisation using Language Models: A dataset and method

Samiran Gode , Supreeth Bare , Bhiksha Raj , Hyungon Yoo

分类：自然语言处理

2023-01-02

Our paper aims to analyze political polarization in US political system using Language Models, and thereby help candidates make an informed decision. The availability of this information will help voters understand their candidates views on the economy, healthcare, education and other social issues. Our main contributions are a dataset extracted from Wikipedia that spans the past 120 years and a Language model based method that helps analyze how polarized a candidate is. Our data is divided into 2 parts, background information and political information about a candidate, since our hypothesis is that the political views of a candidate should be based on reason and be independent of factors such as birthplace, alma mater, etc. We further split this data into 4 phases chronologically, to help understand if and how the polarization amongst candidates changes. This data has been cleaned to remove biases. To understand the polarization we begin by showing results from some classical language models in Word2Vec and Doc2Vec. And then use more powerful techniques like the Longformer, a transformer based encoder, to assimilate more information and find the nearest neighbors of each candidate based on their political view and their background.

translated by 谷歌翻译